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Abstract: This paper presents an innovative access control system, based on human 
detection and path analysis, to reduce false automatic door system actions while increasing 
the added values for security applications. The proposed system can first identify a person 
from the scene, and track his trajectory to predict his intention for accessing the entrance, 
and finally activate the door accordingly. The experimental results show that the proposed 
system has the advantages of high precision, safety, reliability, and can be responsive to 
demands, while preserving the benefits of being low cost and high added value. 

Keywords: computer vision; curve fitting; Gaussian distribution; human detection; 
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1. Introduction 

Automatic entrance/exit door control is widely used in public places such as grocery stores, 
businesses, transportation stations, airports, and wholesale department stores to eliminate the need of 
manually opening and closing actions. Contemporary sensor-based automatic door control 
technologies include infrared, ultrasonic/radio, or other wireless sensing methods. The first can be 
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further divided into active and passive approaches. The active process emits infrared signals from the 
controller and captures the reflected signals to determine if there is any object close to the door. This 
approach is accurate and capable of identifying the position and the speed of the object but its high cost 
has made it less popular. The passive approach detects the infrared signals radiated by people and is the 
most widely used for being simple, effective, and low cost. The ultrasonic/radio approach, on the other 
hand, emits ultrasonic or radio waves to scan the environment and analyzes the returned signals for door 
access control. 

Although these techniques are all successful in detecting objects, they are not capable of 
understanding the type and the intention of the objects. For instance, a puppy or a passing pedestrian may 
accidentally trigger the door and cause a false opening action. Frequent false action is not only annoying, 
and results in air conditioning energy waste, but also reduces equipment lifetime. This calls for the need 
of an automatic door control system based on the detection and intention analysis of people. 

In this paper, door control is based on the confirmation that the detected object is indeed a human and 
the corresponding movement trajectory also indicates that he/she has the intention to go through the 
entrance. Furthermore, an infrared function has been added to prevent people from being trapped by the 
door before they leave the passage. In addition, the captured images can also be saved for other 
applications such as customer analysis and crime investigation. 

This paper is organized as follows: Section 2 describes the system design concept and related 
theoretical basis. The corresponding hardware implementation architecture and the experimental results, 
as well as the performance analysis and discussions, on various field tests are described in Section 3. 
Finally, the conclusion and future work are made in the last section. 

2. Design Concept and Principal Theory 

The proposed innovative door control system is mainly based on human detection and intention 
analysis. The first part involves face detection or contour detection that identifies whether the detected 
object is a person, while the latter includes trajectory tracking and statistical analysis for intention 
estimation. A flowchart of the control procedures is shown in Figure 1 . 

It is known that face detection with reasonable detection rates had been well developed in the 
literature and has the closest relevance to human characteristics, thus is good for people identification. In 
short, once a person is detected in the region of interest (ROI), the trajectory of his face can be tracked 
and then analyzed by a statistical analyzer to calculate the corresponding cumulative probability, and 
will be used as the estimation of the intention. 

On the other hand, in the door-closing procedure of Figure 1, the door is closed if no object is 
detected in the passage on both sides of the door. Overall, processes of the state transition, 
human detection, intention analysis, and the theoretical performance evaluation are described in the 
following subsections. 

2.1. State Transition 

The state diagram of the proposed accessing control system, which includes opening and closing 
actions, is depicted in Figure 2. The door open process identifies the detect targets as human by 
face/contour detection, and calculate the door access intention probability, called P access-, by analyzing 
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the corresponding trajectory. Once the P access is greater than a threshold thpr, the door is opened 
accordingly. On the other hand, an infrared sensor is added to make sure that nobody is passing or stayed 
before door closing, and then activate close action when the entrance/exit is clear for safety. 

Figure 1. System flowchart. 
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Figure 2. The state transition diagram of the system. 
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2.2. Human Detection 



In recent years, human detection techniques, especially those implemented by face detection 
strategies, have been successfully applied to many consumer products such as digital cameras, smart 
phones, or surveillance systems for detecting people [1,2]. In the proposed system, although both face 
detection and contour detection are adopted for human identification, the face detection is used as the 
primary solution and the contour detection based on head and shoulder shape, similar to Reference [3], is 
left as the optional solution for the application which has special privacy considerations. Thus, human 
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detection function is represented by face detection instead of contour detection in the 
following text. Researches of the face detection can be categorized into four types [4], namely, the 
knowledge-based, the feature-based, the template matching-based and the appearance-based methods. 

In the first method, the face is identified by face features constructed by relative locations and distances of 
eyes, nose, and mouth [5]. The detection rate is strongly affected by the orientation of the face. 

For feature-based approaches, face features are constructed from the statistics of the locations and the 
distances between the facial organs [6-9]. A face is detected by matching the regions with similar 
features and skin color information. The detection rate of this method is better than the previous one, but 
the performance may easily be influenced by the noises, shadows, and lighting conditions [4]. 

The template matching methods use predetermined facial template for matching. A face is identified 
once the Euclidean distance between the ROI and the template is less than a certain threshold value [10]. 
This approach is easy to implement but is not versatile for various types of face geometry. 

In the appearance-based methods, statistical analysis and machine learning techniques are used to 
find the best features of faces [1 1-13]. Although longer learning time and more samples are needed in 
the training process, these methods have high detection rate and low processing time and are more 
practical in real time applications [14] hence will be adopted in our study. 

The appearance-based methods can be divided into three types: linear/nonlinear projection, neural 
network, and Gabor filter/Haar-like feature selection. For example, the principal component 
analysis (PC A) [15], linear discriminate analysis (LDA) [16], and independent component analysis 
(ICA) [17] are popular projection based methods in which features extracted from face images are 
projected into a lower dimensional feature space where the most significant feature vectors are selected. 
Thereafter, an input sample vector with distance (e.g., Euclidean distance) shorter than a given threshold 
will be determined as a face image. Basically, performance of this type depends mainly on appropriate 
distance measurement. In contrast to the projection scheme, the neural network scheme uses huge face 
image samples for training (such as back propagation network), and then uses the trained network as 
classifier to detect human faces in the input image [11]. 

The Gabor filter method constructs a Gabor feature mask from input face sample images and uses it 
as face discrimination criterion. Although it has high detection rate and little influence by background 
illumination, the high computation complexity makes it be less practical [18]. Viola et al. proposed an 
Adaboost structure to overcome this drawback [12]. They use a Haar-like mask to train the face 
database, and determine the spatial feature masks of the face as weak classifiers. These weak classifiers 
are further cascaded (boosting) as a strong classifier that becomes a good face detector. Although the 
training process is time consuming, the Adaboost scheme is high in both detection rate and detection 
speed than neural network approach [12-14] thus is much practical for implementation in embedded 
system hence is used in the proposed system. 

2.3. Intention Analysis 

The intention analysis also includes two parts: trajectory tracking and estimation of the probability to 
access the door. The trajectory tracking is realized by comparing the overlapping area of the face images 
at two consecutive time instants against a threshold value. After that, people's intention of accessing the 
door is estimated by analyzing the probability distribution of movement in both vertical and horizontal 
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directions. The trajectories of movement in different directions from different locations are recorded first 
to establish the distribution of ROI. This probability value is then used to calculate the corresponding 
cumulative density function corresponding to the trajectory to determine whether or not the person has 
intention to access the door. 

2.3.1. Face Tracking via Spatial-Temporal Property 

For the commonly used frame rate greater than 15 frames/s, the overlapped part of the two face 
blocks at two consecutive time instants along the trajectories to the door would be large and the 
displacement between the block centers y^ -1 ) and (x* 9 y*) is assumed to be small. Thus, 

algorithm 1 is used to filter out the persons who seems having no intention to access the door. 



If (d = djdj x + dj < th d \ d y = \y z c - y z c x | 
then, the two face blocks match. 
Else, remove the block at time t-1. 

where d x = \x l c - x l ~\ d y = \y* - y^ -1 ! and thd is a given threshold value which is assigned to be the 
width of a nominal human face W t -i in our research as shown in Figure 3. 



2.3.2. Intention Analysis by Trajectory Statistics 

If a person within the ROI has intention to access the door, either the trajectory is characterized by 
continuous movement toward it or the person's face persists in the ROI for a certain period of time. 
Thus, the intention can be modeled by first recording the 2D trajectories in front of the door then taking 
projections along the directions vertical and parallel (or horizontal) to the door and finally conducting 
curve fitting by appropriate probability density functions (pdf) for the projection curves. 

A set of field recorded trajectories approaching the door is shown in Figure 4(a). Note that a person 
with the intention to access the door will move toward the door, the closer to the door the higher 
probability that the target has intention to access the door. 



Algorithm 1 



Figure 3. The spatial-temporal property of a moving face. 
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Figure 4. The trajectories and the pdfs (a) Trajectories approaching the door (b) Vertical 
projection of the trajectories (blue) and the corresponding pdf (red) (c) Horizontal projection of 
the trajectories (blue) and the corresponding pdf (red) (d) Probability table for the joint pdf. 
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Vertical and horizontal projections are conducted to obtain the corresponding ID histograms s v and Sh 
for x c and y c respectively, and is indicated by the blue lines in Figure 4(b,c). 

Note that the projection s v looks like a normal distribution while Sh is similar to a non-central 
Chi-square-distribution as indicated by the red dotted lines in Figure 4(b,c). Thus, we have: 

1 J-(x c -M)/2<T 2 ) 

(la) 



4~2 



KG 



^(v;;/ f ^) = ^- ( ^ )/2 (^ /4 - 1/2 A/ 2 - 1 (#Z) 



(lb) 



where x c = x c /I w and y c = y c x 24/I L represents the normalized coordinates, which I L and I w is the length 
and width of the image respectively. To enhance the difference of the probability value at different 
locations, s v and Sh is normalized to range from zero to one. Nonlinear least squares curve fitting [19] is 
then used to estimate the parameters of the pdf for s v and Sh. 

It is assumed, in this paper, that we have ju = 0.5 at the center point of the entrance image. It seems 
once a person has intention to go through the entrance, the moving trajectory is getting together toward 
the entrance. Thus, the width of s v increases (it means the variance o of s V9 increase also) in proportion 
to the size of door (DS). On the other hand, parameter X of Sh also increases when D, the distance 
between a pedestrian to the entrance, get increase. That is, a and X can be obtained by: 

a 1 = (DS/M) 2 



(2a) 
(2b) 



where M denotes the width of ROI. To collect more trajectories for further statistical analysis, o is 
multiplied by a factor k in this paper. 

The associated joint probability p(x c> y c ) is evaluated by means of the inclusion-exclusion 
principle [19] given in Equation (3), based on which a 3D probability table (PT) is constructed to 
simplify the computation as shown in Figure 4(d) for a nominal image of size 320 x 240: 

p(x c > y c ) = s v( x c) + s h(y c ) - s v( x c) x s h(y c ) (3) 

To determine whether or not a person has intention to go through the specific entrance/exit we use a 
sequence of T images after a human face is detected in the ROI and calculate the probability of the 
person's intention to access the door P access using Equation (4): 



Sensors 2013, IS 



5929 



P =—T T p(x',y') (4) 

access ji / * f-\ * ^ c 5 ✓ c ' V / 

If the face stays long enough during the time interval T, the value of P access will be high. An average 
value fipT of the joint pdf is adopted as the threshold ju PT , and is defined as: 

MpT = PT L xPT w ^<=° PT{hj) ( 5 ) 

where PTi and PT W represents the length and width of irrespectively, while PT(i,f) denotes the value 
of PTdX (ij) coordinate. 

It is observed that if one go through the entrance, the accumulated probability of his trajectory in PT 
region (the black dotted line in Figure 4(d)) will be greater than ju PT within T interval. Therefore, ju PT is 
used, in this paper, as the threshold th PP to determine whether one has intention to go through the 
entrance or not. That is, if P access is greater than jU PT , the "opening" command is activated. 

2.4. System Performance Evaluation 

The performance of the proposed system is evaluated by both detection rate and false rate FR tot . In 
this paper, the face detection is implemented, based on the Viola method [12], by a 10-layers cascade 
boosting structure. In each layer, the face detection rate is set to be greater than 0.99 at training stage. 
Thus, the theoretical overall face detection rate is about 0.99 10 ~ 0.9. On the other hand, the false rate 
FR tot can be defined by FR tot = FAR + FRR, where FAR and FRR are the false acceptance rate and the 
false rejection rate, respectively. In general, since a non-human face being erroneously identified as a 
human face will be filtered in the subsequent tracking procedure hence will not cause any false 
activation of the door, the FAR can be ignored and the only false rate that needs attention is FRR, or 
FRtot = FRR. From the experiments, it is observed that the rate of falsely activating the door is below 2 
in one million times, compared with the traditional systems that assume nearly 1 out of 5 or 20% of 
false activation. 

2.4.1. False Rejection Rate of the System 

The proposed method accumulates the number of the same face images detected during time interval 
Tto determine the intention of the target and decides positively if it is greater than th PT . Thus, the more 
the face images detected inside the dotted region (Figure 4(d)), the higher the probability that the target 
has intention to access the door and likewise for the miss. In other words, the more the missed face 
images, the higher the FRR. Thus, FRR can be formulated by Equation (6): 

NT 

FRR=Y, CfP{M F y x[l- P{M F )] NT -' (6) 

i=NF 

where NT = R D x T and NT = RF x th PP are the total number of images and the number of face images 
corresponding to the threshold th PP to activate the door during time interval T, respectively, Rd is the rate 
of processing a face image, which is five frames/second in our system, and P(M F ) is the probability of 
miss, as shown in Equation (7): 

P(M F ) = P(MR)xP(th PT ) (7) 
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where P(th PT ) denotes the ratio of area whose value is greater than th PT over the total area of PT 9 which 
is 0.54 in our experiments, and P(MR) is the miss rate of face detection defined by 
P(MR) = l-P(DR), with P(DR) being the detection rate, which is 0.9 in our system, as described 
previously. The theoretical estimation show that if T= 2 s, FRR will be close to 2.4 x 10~ 5 , but if T= 3 s, 
FRR will be 3.9 x 10" 8 and lower. 

3. Experiment Results and Discussion 

The proposed system can determine the user's intention and provide access to a door by detecting his 
position and tracking his trajectories of movement in the image sequence. To implement the system, a 
DSP based multimedia module, TI DM368, is selected as the hardware platform. The algorithms 
developed were programmed in C and executed under the Linux platform. The specifications of 
hardware and software are as follows: an ARM9 CPU with a speed of 900 MHz with embedded 
processing algorithm is used. The system prototype and the DSP platform board are shown in 
Figure 5(a,b). The image sensor resolution is 500 M and the test image size is 320 x 240. The other 
installation parameters are described as follows: the camera is installed on a wall at 2.2 m high and 
30° tilt. The size of ROI is 3.11 m (width) and 4.11 m (distance to door) respectively, as shown in 
Figure 5(c). The width of door is 1.6 m. Thus, according to Equation 2(a,b), a 2 of s v and X of Sh can be 
calculated as a = 0.26 and X = 4. 1 1 , respectively. The infrared signal scanning in ROI is indicated by the 
red area, as shown in Figure 5(d). The scanning width and distance are 3.1 1 m and 1.28 m (red line to 
door), respectively. 

Figure 5. Prototype of the proposed system and the installation parameters illustration, 
(a) and (b) are the system prototype appearance and designed hardware platform, 
respectively, (c) and (d) illustrate the system installation example and the area for infrared 
signal scanning. 




(a) (b) (c) (d) 

Demonstrations of processing results under different cases, including single entering, multiple 
entering, and just passing through, are shown in Figures 6-8. 

In the case of single person, it is observed in Figure 6(b-d), that the cumulative probability 
increases if a person continues walking toward the entrance so the face is detected. Once the 
cumulative probability is greater than the threshold value, the opening action is triggered as shown in 
Figure 6(d). In the multi-persons case of Figure 7, cumulative probability values for all detected faces 
are calculated independently. Thus, as long as one of the cumulative probability values exceeds the 
threshold, the door will be opened as shown in Figure 7(d). Finally, in Figure 8, it is observed that 
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although a person is detected in ROI, the door is not opened since the person is just passing by the 
entrance, thus the cumulative probability does not exceed the threshold value. To evaluate the 
performance of the proposed system, five different scenes including one lab and four business places are 
used, while the test condition includes day, night, indoor, and outdoor conditions. The 
four business places include two shops, one restaurant, and an office. Moreover, to illustrate the 
effect of the system, three of these places are recorded by a side view camera set and are shown in 
Figures 9 and 10. 

Figure 6. Opening activated by single person in lab (first row) and restaurant (second row) 
scenes: The cumulative probability increases if a person is continued walking toward the 
entrance thus the face is detected. Once the cumulative probability is greater than the 
threshold value, the opening action is triggered as shown in (d). 




Figure 7. Opening activated by multiple persons in lab and restaurant: The 
cumulative probability values for all detected faces are calculated independently. Thus, as 
long as one of the cumulative probability values exceeds the threshold, the door will be 
opened as in (d). 
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Figure 8. Illustration of "just passing by" case in lab and restaurant: It is observed that 
although a person is detected in ROI, the door is not opened since the person is just passing 
by the entrance, thus the cumulative probability does not exceed the threshold value. 




(a) (b) (c) (d) 

Figure 9. Illustrations of success door activation by person with accessing intention at 
3 business places. The first row: electronic materials store (day). The second row: wine store 
(night). The third row: business office (indoors). 




(a) (b) (c) (d) 

By observing the process from the embedded camera at the lab and restaurant places which are shown 
in Figures 6-8, it is also observed from Figures 9 and 10 that the door is correctly activated by the 
pedestrian who has intention to access, it but kept closed while he is just passing by or the image object 
is not a real person. Interested readers can access the recorded videos [20] to see the experimental 
process. The false rate result collected from the above places is presented in Table 1. 
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Figure 10. Illustrations of correct activation rejection at the same places. The first 
and second rows show the cases of a human passing by and stay of a lying dog, respectively. 
The third and fourth rows show the passing by conditions of a motorcycle and a human 
respectively. 






(a) (b) 


(c) 




(d) 




Table 1. The Experimental Results of False Rate. 








Results/Location 


No. of Persons Failure Count 


Detection Rate 




Electronic materials store (Banqiao city) 


52 


1 






Wine store (Yingge city) 


28 


0 


99.5% 


Real Test 


45 


0 


Office (Yangmei city) 






Restaurant (Xindian city) 


72 


0 




Lab. Test 


Lab. (Banqiao city) 


56 


0 


100% 


Total 




253 


1 


99.6% 



It is observed that the face detection function is hardly affected by the following cases such as 
wearing glasses, mask, hat, etc. The collected data, in 253 trials at five different locations, shows that the 
correct opening rate within the default 2 s is 0.996 (252/253) while the incorrect action number is only 1 . 
It is noted that although the FAR (0.004) is slightly higher than the theoretically predicted number, for 
the only one failure case, however, the door is still opened but just delayed for a short time (the response 
time is 4 s). Moreover, an interesting phenomenon is observed that although some people with access 
intention are walking dejected/passing with heads down thus no face is detected, finally they stop behind 
the close-door and look up for something to open the door. These actions, however, reveal their faces 
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which are detected by the camera and finally the opening action is activated after a few seconds delay. 
Therefore, there exists zero case when a person wanted to enter was rejected and it can reject the false 
opening actions for people passing, staying, and non-human cases, no matter in the day, night, indoor, or 
outdoor conditions. 

4. Conclusions and Future Works 

In this paper, an automatic door control system is implemented on a DSP platform. The system can 
first identify a target as person by face detection, and then analyze the path trajectory to determine 
whether the person has intention to access the door or not, thus to control the door accordingly. It is noted 
that the system has advantages of low false rate (near 0%), high correct activating rate (99.6%), and short 
response time (within 2 s) from detecting the target, confirming his intention, to activate the door 
opening. Moreover, via statistical analysis on detected face in consecutive time sequence, case of 
passing persons with missing face can still be confirmed within 4 s thus to activate the door correctly, as 
shown in Table 1 . 

To sum up, the proposed method builds up the statistic model of moving trajectories in ROI first, and 
the corresponding probability of a face at certain location can be obtained by lookup table. If the average 
probability of face trajectory is greater than th PT within T period, the person is said to have intention to go 
through the entrance. That is, region within dotted line as shown in Figure 4(d), whose pdf is greater than 
thpT, is viewed as key region while determining one's intention. If the face trajectory locates mostly 
within the key region, the person is said to have intention to go through the entrance. According to 

o 

Equation (6), if thppis set to be 0.8 (or 0.2) and Tis set by 2 s, the FR tot will become 0.18 (or 6.5 x 10 ). 
The response time to confirm one's intention will increase when thpr increase, while lower thpr value 
will result more false opening actions. It is observed, by our experiments, that if th PT is set to be the 
average of ROI, said ]Upt, the FR tot is about 6.3 x 10~ 5 and is suitable for most application cases. 

The proposed intelligent control system, compared with traditional ones, not only reduces the false 
action rate, but offers extra power saving benefits. For example, less false door activation reduces the 
energy exhaust of air conditioners as well as the door driver. Moreover, combining face recognition as 
well as behavior analysis algorithm into the built-in camera and DSPs module, it is possible to add 
disaster or crime prevention functions that thus can be applied to surveillance applications. To sum up, 
although the cost is somewhat higher than that of traditional systems, the proposed system should be 
viewed as, instead of an accessing control system only, a general purpose intelligence video surveillance 
(IVS) system as long as the detection function is replaced according to the related applications, for 
example, to detect the car, motorcycle, or detect the static object application in home healthcare [21] or 
pedestrians in a forbidden region by adopting the corresponding detection function and 3D probability 
model. Some of the extended functions such as plate detection, pedestrian detection, and flood detection, 
had been successfully examined in the proposed platform [22,23], while other functions with huge 
complexity, e.g., fire detection, can also be fulfilled using a similar hardware structure but replacing a 
faster DSP module. That is, the proposed system is designed not only as an access control system but 
also a watch dog in front of the entrance, thus, the CP value of the proposed system is much greater than 
that of the existing systems. 
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